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What May Visualization Processes Optimize? 

Min Chen, Member, IEEE, and Amos Golan 

Abstract —In this paper, we present an abstract model of visualization and inference processes and describe an information-theoretic 
measure for optimizing such processes. In order to obtain such an abstraction, we first examined six classes of workflows in data 
analysis and visualization, and identified four levels of typical visualization components, namely disseminative, observational, an¬ 
alytical and model-developmental visualization. We noticed a common phenomenon at different levels of visualization, that is, the 
transformation of data spaces (referred to as alphabets) usually corresponds to the reduction of maximal entropy along a workflow. 
Based on this observation, we establish an information-theoretic measure of cost-benefit ratio that may be used as a cost function 
for optimizing a data visualization process. To demonstrate the validity of this measure, we examined a number of successful visual¬ 
ization processes in the literature, and showed that the information-theoretic measure can mathematically explain the advantages of 
such processes over possible alternatives. 

Index Terms —Visualization, visual analytics, information theory, theory of visualization, cost-benefit ratio, process optimization. 
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1 Introduction 

Over the past 25 years, the field of visualization has developed 
to encompass three major subfields, namely scientific visualiza¬ 
tion, information visualization and visual analytics as well as many 
domain-specific areas, such as geo-information visualization, biolog¬ 
ical data visualization, software visualization, and others. A num¬ 
ber of pipelines have been proposed for visualization in general (e.g., 
[44,62,63]) and for visual analytics in particular [29,40], In practice, 
a visualization workflow normally includes machine-centric compo¬ 
nents (e.g., statistical analysis, rule-based or policy-based models, and 
supervised or unsupervised models) as well as human-centric com¬ 
ponents (e.g., visualization, human-computer interaction, and human- 
human communication). The integration of these two types of compo¬ 
nents become more and more common since visual analytics [59,75] 
has become a de facto standard approach for handling large volumes 
of complex data. 

Given a visualization workflow in a specific context, it is inevitable 
that one would like to improve its cost-benefit ratio, from time to time, 
in relation to many factors such as accuracy, speed, computational and 
human resources, creditability, logistics, changes in the environment, 
data or tasks concerned, and so forth. Such improvement can typ¬ 
ically be made through introducing new technologies, restructuring 
the existing workflow, or re-balancing the tradeoff between different 
factors. While it is absolutely essential to optimize each visualiza¬ 
tion workflow in a heuristic and case-by-case manner [45], it is also 
desirable to study the process optimization theoretically and mathe¬ 
matically through abstract reasoning. In many ways, this is similar 
to the process optimization in tele- and data communication, where 
each subsystem is optimized through careful design and customiza¬ 
tion but the gain in cost-benefit is mostly underpinned by information 
theory [18,53]. In this paper, we study, in abstraction, the process op¬ 
timization in visualization from an information-theoretic perspective. 

Visualization is a form of information processing. Like other forms 
of information processing (e.g., statistical inferences), visualization 
enables transformation of information from one representation to an¬ 
other. The objective of such a transformation is typically to infer a 
finding, judgment or decision from the observed data, which may be 
incomplete and noisy. The input to the transformation may also in¬ 
clude “soft” information and knowledge, such as known theories, intu¬ 
ition, belief, value judgment, and so on. Another form of input, which 
is often referred to as priors, may come from knowledge about the sys¬ 
tem where the data are captured, facts about the system or related sys¬ 
tems, previous observations, experimentations, analytical conclusions. 
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etc. Here we use the terms data, information and knowledge according 
to the commonly-used definitions in computational spaces [12]. 

All inferential processes are designed for processing a finite amount 
of information. In practice, they all encounter some difficulties, such 
as the lack of adequate technique for extracting meaningful informa¬ 
tion from a vast amount of data; incomplete, incorrect or noisy data; 
biases encoded in computer algorithms or biases of human analysts; 
lack of computational resources or human resources; urgency in mak¬ 
ing a decision; and so on. All inferential problems are inherently 
under-determined problems [25,26], 

The traditional machine-centric solutions to the inferential problem 
address these difficulties by imposing certain assumptions and struc¬ 
tures on the model of the system where the data are captured. If these 
assumptions were correctly specified and these structures were per¬ 
fectly observed, computed inference based on certain statistics (e.g., 
moments) would provide us with perfect answers. In practice, it is 
seldom possible to transform our theory, axioms, intuition and other 
soft information into such statistics. Hence optimization of a visual¬ 
ization process is not just about the best statistical method, the best 
analytical algorithm, or the best machine learning technique. It is also 
about the best human-centric mechanisms for enabling uses of “soft” 
information and knowledge. 

In this paper, we propose to measure the cost-benefit of 
a visualization-assisted inference process within an information- 
theoretic framework. The work is built on a wealth of literature on 
visualization and visualization pipelines (e.g., [29,40,44,62,63]) and 
that on information theoretic measures and inference in the statis¬ 
tics and econometrics [27, 28, 34], It is a major extension of the 
information-theoretic framework for visualization proposed by Chen 
and Janicke [14], and a major extension of statistical inference and 
information processing in general (e.g., [25]). Our contributions are: 

• We propose a new categorization of visualization workflows and 
identify four levels of visualization commonly featured in differ¬ 
ent data analysis and visualization processes (Section 3). 

• We present an information-theoretic abstraction of visualization 
processes as transformation of alphabets along a workflow for 
data analysis and visualization, and identify a common trend of 
reduction of Shannon entropy (i.e., uncertainty) in such work- 
flows (Section 4). 

• We propose an information-theoretic measure of cost-benefit, 
which can be applied to the whole workflow as well as individual 
processing steps (Section 4). 

• We demonstrate that this cost-benefit measure can explain 
the information-theoretic advantages of successful visualization 
workflows in the literature, suggesting that it can be used for 
optimizing a visualization-assisted inference process through a 
combination of quantitative and qualitative analysis (Section 5). 



2 Related Work 

In 2003, Grinstein et al, [31] posed an intriguing question about us¬ 
ability vs. utility when they considered visualization as an interface 
technology that draws from both machine- and human-centric capa¬ 
bilities. This is a question about optimization. 

Pipelines and Workflows. In the field of visualization, many have 
considered pipelines or workflows that feature components such as 
analysis, visualization and interaction. Upson et al. provided one of 
the earliest abstraction of a pipeline with four main components, data 
source, filtering and mapping, rendering and output [62]. Wood et al. 
proposed an extension for collaborative visualization in the form of 
parallel pipelines [76]. van Wijk outlined a two-loop pipeline, bring¬ 
ing interaction and cognition into a visualization process [63]. Green 
et al. proposed a revision of this pipeline [29]. Keim et al. proposed 
a pipeline featuring two interacting parallel components for data min¬ 
ing models and visual data exploration respectively [40]. Janicke et 
al. examined several pipelines for comparative visualization, and dis¬ 
cussed quality metrics for evaluating reconstructibility of visualiza¬ 
tion [36]. Bertini et al. proposed an automated visualization pipeline 
driven by quality metrics [4], Recently Moreland surveyed visual¬ 
ization pipelines mainly in the context of scientific visualization [44]. 
There are many other variations of visualization pipelines in the litera¬ 
ture, such as [11.13,16,32,38]. All these discussions on visualization 
pipelines pointed out one common fact, i.e., visualization processes 
can be broken down to steps, which may be referred to as transfor¬ 
mations or mappings. This work considers this ubiquitous feature of 
visualization in abstraction. 

Design Methods and Processes. Abram and Treinish proposed to 
implement visualization processes on data-flow architectures [1]. Chi 
described visualization processes using a state reference model, in¬ 
volving data, visualization, and visual mapping transformation [16]. 
lansen and Dragicevic proposed an interaction model in the context 
of visualization pipelines [39]. Munzner proposed a nested model for 
designing and developing visualization pipelines [45]. Wang et al. 
proposed a two-stage framework for designing visual analytics sys¬ 
tems [71]. Ahmed et al. proposed to use purpose-driven games for 
evaluating visualization systems [2], Scholtz outlined a set of guide¬ 
lines for assessing visual analytics environments [51], and Scholtz et 
al. further developed them into an evaluation methodology [52], The 
theoretic abstraction presented in this paper is built on these works, 
and complement them by offering a mathematical rationalization for 
good practices in designing and assessing visualization systems. 

Theories of Visualization and their Applications. In developing 
theories of visualization, much effort has been made in formulating 
categorizations and taxonomies (e.g., [3, 60, 73]). Some 25 differ¬ 
ent proposals are listed in [14, 15]. In addition, a number of con¬ 
ceptual models have been proposed, including object-oriented model 
by Silver [55], feature extraction and representation by van Walsurn 
et al. [67], visualization exploration by Jankun-Kelly et al. [38], dis¬ 
tributed cognition model by Liu et al [43], predictive data-centered 
theory by Purchase et al. [48], Visualization Transform Design Model 
by Purchase et al. [48], cognition model for visual analytics by Green 
et al. [30], sensemaking and model steering by Endert et al. [21], mod¬ 
elling visualization using semiotics and category theory by Vickers 
et al. [65], composition of visualization tasks by Brehmer and Mun¬ 
zner [9], and visual embedding by Demiralp et al. [20]. Recently, 
Sacha et al. proposed a knowledge generation model [50], introduc¬ 
ing a visual analytics model with exploration and verification loops. 
The deliberations in these works represent qualitative abstraction of 
visualization processes. 

Meanwhile, the development of mathematical frameworks is gath¬ 
ering its pace in recent years. One of these is the information theoretic 
framework, which was initially suggested by Ward [48], then gener¬ 
alized and detailed by Chen and lanicke [14], and further enriched 
by Xu et al. [77] and Wang and Shen [69] in the context of scien¬ 
tific visualization. Another is the algebraic framework proposed by 
Kindlmann and Scheidegger [41], who justifiably placed their focus on 
visual mappings, which are inherently the most important transforma¬ 


tions from a visualization perspective. While an algebraic formulation 
typically describes mappings between set members (e.g., from a pair 
of datasets to a pair of visual representations in [41]). an information- 
theoretic formulation describes mappings between sets together with 
the probabilistic distributions of their members. 

This holistic nature of information-theoretic reasoning has enabled 
many applications in visualization, including light source placement 
by Gumhold [33], view selection in mesh rendering by Vazquez et 
al. [64] and Feixas et al. [22], view selection in volume rendering by 
Bordoloi and Shen [5], and Takahashi and Takeshima [56], focus of at¬ 
tention in volume rendering by Viola et al. [66], multi-resolution vol¬ 
ume visualization by Wang and Shen [68], feature highlighting in un¬ 
steady multi-field visualization by lanicke and Scheuermann [35,37], 
feature highlighting in time-varying volume visualization by Wang et 
al. [70], transfer function design by Bruckner and Moller [10], and by 
Ruiz et al. [8,49], multimodal data fusion by Bramon et al. [6], evalu¬ 
ating isosurfaces [74], measuring of observation capacity [7], measur¬ 
ing information content in multivariate data [23], and confirming the 
mathematical feasibility of visual multiplexing [15]. 

3 Workflows in Visualization 
3.1 Six Classes of Workflows 

Consider a broad range of workflows in visualization, including those 
historically referred to as analysis, inference, simulation or visual ana¬ 
lytics as well as those emerged recently, as long as they feature a com¬ 
ponent of visualization , i.e.. mapping some data to alternative visual 
representations. As a process of abstraction, we group these workflows 
into six classes as illustrated in Fig. 1. They feature the following 
types of components: 

• Machine Processing (M) — These are computational processes 
executed by computers including, for instance, computation of 
statistical indicators (e.g., mean, correlation index, etc.), data 
analysis (e.g., classification, anomaly detection, association anal¬ 
ysis, etc.), simulation, prediction, recommendation and so on. 
Each computational process is defined by a program that may 
encode a theoretic or heuristic model, which we refer to gener¬ 
ally as a Model. 

• Human Processing (H) — These are human cognitive processes 
and related activities including, for instance, viewing, reasoning, 
memorizing, discussing, decision making and so on. 

• Visual Mapping (V) — These are processes where data are trans¬ 
formed to alternative visual representations to be viewed by hu¬ 
mans. We purposely treat these processes separately from M 
and H. and assume that visual representations can be generated 
by many means front hand-drawn plots and illustrations to auto¬ 
mated generation of visualization. 

• Interaction (I) — These are actions taken by humans to influence 
an M or V process. They include typical interactions in visual¬ 
ization [78], such as parameter adjustment, and model creation 
and refinement. In Fig. 1, they are not explicitly shown as a pro¬ 
cessing block, as the main cognitive processing for interaction is 
assumed to take place in H. Instead, they are indicated by a solid 
bar on a connection. 

Workflow class W\ encompasses perhaps some of the most common 
process in data analysis and visualization. In this process, one or more 
human analysts (H) process the input data with or without the aid of 
computation, gain some understanding, create some visualization (V) 
and convey the understanding to others (H). Many visualization im¬ 
ages in the media and visual illustration in user manuals fall into this 
class. The goal of visualization is to pass on known information and 
knowledge to others, and the dissemination process is almost always 
accompanied by written or verbal commentaries describing the under¬ 
standing and/or opinions of analysts. We refer to this form of visual¬ 
ization as Disseminative Visualization, and represent the visualization 
part of the workflow as a macro block Vd. 

The second class, Wi, encompasses many operational processes, 
where human analysts need to use visualization to observe data rou¬ 
tinely. For examples, stock brokers frequently glance at various 
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Fig. 1. Six typical workflows in data analysis and visualization. The 
subgraphs, Vd, V 0 , V a , and V M represent four levels of visualization. 


time series plots, drivers glance at their GPS-navigation devices reg¬ 
ularly, neurologists examine visual representations of various scans 
(e.g., electroencephalography, computed tomography, diffusion tensor 
imaging, etc.) of their patients, and computational scientists visual¬ 
ize simulation results after each run. The goal of visualization is to 
enable intuitive and speedily observation of features, phenomena and 
events captured in the data, and to provide external memorization of 
what have been observed. We refer to this form of visualization as 
Observational Visualization , and represent this as a macro block Vo- 
Although the two macro blocks Vo and Vo appear to be similar ex¬ 
cept an extra forward transition in Vd, their fundamental difference is 
that in Vd analysts have already gained the understanding to be con¬ 
veyed before the visualization is generated, while in Vo visualization 
is generated in order to gain a new understanding. Of course, Vo can 
be followed by Vd to disseminate such a new understanding. 

Workflow W 3 depicts a class of processes where automated data 
analysis plays a dominant role, and humans are only the destination of 


dissemination. In many ways, W 3 is almost identical to W\ , except that 
in IV 3 the understanding and/or opinions conveyed to humans through 
Vd are from machine processing. Such a workflow has its place in 
data analysis and visualization, when the machine is always or almost 
always correct about what is being conveyed. When such a high level 
of correctness is not assured, it is necessary to increase humans’ in¬ 
volvement in these processes. 

This leads to workflow class W 4 , where human analysts are able 
to observe input data in conjunction with the machine’s “understand¬ 
ing”. In many ways, this workflow is similar to the parallel pipeline 
proposed by Keim et al. [40], It allows analysts to receive compu¬ 
tational results from machine processing, while evaluating the cor¬ 
rectness of the results and identify possible false positives and neg¬ 
atives. For example, in much investigative analysis for examining and 
understanding complex relationships among data objects, the amount 
of input data often makes direct observation time-consuming. The 
machine-processing hence enables the analysts to prioritize their ef¬ 
fort and structure their reasoning and decision-making process. At the 
same time, analysts are able to explore the data and adjust the model 
depending on the analysts’ judgment about the quality of the computed 
results. We refer to this form of visualization as Analytical Visualiza¬ 
tion, and represent this as a macro block Va- 

When the correctness or accuracy of a model is the main concern, 
the focus of visualization is shifted to assisting analysts in improving 
an existing model or creating a new model. Both workflow classes W 5 
and W <5 represent such a focus. In W$, analysts first observe some input 
data, and then identify an existing model or formulate a new one for 
processing the data. Tasks for such processing may include, but not 
limited to, computing statistical indicators; detecting features, objects, 
and events; identifying patterns, associations, and rules; and making 
predictions and recommendations. In many cases, W 5 may represent a 
long-term process for developing a theory and its applications, such as 
physical laws and their applications in computer simulation. Wg thus 
represents a class of commonly-occurred workflows where analysts 
deploy known theories to specify a model without the initial obser¬ 
vational visualization for establishing these theories. In practice, to 
create, test and optimize a model, analysts often make use of Wg and 
VTg for different parts of a model. For example, in developing a simula¬ 
tion model, major computation steps are defined according to known 
quantitative laws, while initial and boundary conditions are defined 
based on observations and experiments. We thereby refer to these two 
forms of visualization collectively as Model-developmental Visualiza¬ 
tion, and represent them as the same macro block Vm- Note that we 
have avoided the phrase “modelling visualization” here as it could be 
misread as an action “to model visualization”. One day, there might 
be a new adjective, e.g., in the form of modelative or modelary. 

3.2 Four Levels of Visualization 

The four macro blocks, namely Vd, Vo, Va, and Vm can be seen 
as four levels of visualization. The different levels, which are sum¬ 
marized below, reflect the complexity of visualization tasks from the 
perspective of analysts. 

• Level 1: Disseminative Visualization (Vd) — Visualization is 
a presentational aid for disseminating information or insight to 
others. The analyst who created the visualization does not have a 
question about the data, except for informing others: “This is A !” 
where A may be a fact, a piece of information, an understanding, 
etc. At this level, the complexity for the analyst to obtain an 
answer about the data is 0(1). Here we make use the big O 
notation in algorithm and complexity analysis. 

• Level 2: Observational Visualization (Vo) — Visualization is an 
operational aid that enables intuitive and/or speedily observation 
of captured data. It is often a part of routine operations of an 
analyst, and the questions to be answered may typically be in the 
forms of “What has happened?” ’’When and where A, B, C, etc., 
happened?’ At this level, the observation is usually sequential, 
and thus the complexity is generally O (n), where n is the number 
of data objects. Broadly speaking, a data object is a data record. 
We will give a more precise definition of it in Section 4. 

















































































• Level 3: Analytical Visualization (V\) — Visualization is an in¬ 
vestigative aid for examining and understanding complex rela¬ 
tionships (e.g., correlation, association, causality, contradiction). 
The questions to be answered are typically in the forms of “What 
does A relate to?” and “Why?” Given n data objects, the num¬ 
ber of possible /.-relationships among these data objects is at the 
level of 0{n k ) (k > 2). For a small n, it may be feasible to exam¬ 
ine all /.-relationships using observational visualization. When n 
increases, it becomes necessary to use analytical models to pri¬ 
oritize the analyst’s investigative effort. Most visual analytics 
processes reported in the recent literature operate at this level. 

• Level 4: Model-developmental Visualization (Vm) — Visualiza¬ 
tion is a developmental aid for improving existing models, meth¬ 
ods, algorithms and systems, as well as for creating new ones. 
The questions to be answered are typically in the forms of “How 
does A lead to B?” and “What are the exact steps from A to 
B?” If a model has n parameters and each parameter may take 
k values, there are a total of k n combinations. In terms of com¬ 
plexity, this is 0(k"). If a model has n distinct algorithmic steps, 
the complexity of their ordering is 0(n!). Model-developmental 
visualization is a great challenge in the field of visualization. 

Hence the levels correspond to the questions to be asked and the 
complexity of the space of optional answers. For example, given a 
financial prediction model, if an analyst uses visualization to demon¬ 
strate its effectiveness to an audience, it falls into workflow class W 3 . 
This is level 1 visualization, as the analyst knows or assumes the model 
to be correct. 

If the analyst sequentially observes a financial data stream and some 
basic statistics about the data in order to capture some events, it more 
or less follows the same workflow W 2 , and it is level 2 visualization. 

If the analyst applies a prediction model to the input data streams, 
and then uses visualization to observe the input data and its basic 
statistics, to receive the predictions and recommendations computed 
by a machine process, and to reason about potential errors, such a pro¬ 
cess is encapsulated by W 4 . The analysis of errors and noise typically 
involves examination of the relationships among different events in the 
input data streams, statistical indicators, computed trends and recom¬ 
mendations. It is more complex than observing events in a data stream 
sequentially. This is level 3 visualization. 

If the analyst identifies that a prediction model does not perform sat¬ 
isfactorily. and attempts to optimize it by, for example, experimenting 
with various parameters in the model, this falls into workflow class 
W 5 . Alternatively, the analyst may wish to create a new prediction 
model based on a different economic theory, this falls into workflow 
class Wg. When visualization is used to assist the analyst in exploring 
the parameter space or the model space, this is level 4 visualization. 

4 An Information-Theoretic Abstraction 
4.1 Alphabets and Letters 

The term data object is an encompassing generalization of datum, 
data point, data sample, data record and dataset. It contains a finite 
collection of quantitative and/or qualitative measures that are values 
of a finite set of variables. For example, consider a univariate vari¬ 
able X for recording the population of a country. A value represent¬ 
ing the UK population in 2010 is a datum, and thus a data object. 
A collection of the individual population figures of N countries in 
2010 is also a data object, where the N values may be considered 
as a sample of data points of X, or separate records of N variables 
Xj (i = 1,2,. ..,n). Similarly, a time series recoding the UK annual 
population between 1900 and 2010 is a data object. The 111 values 
in the time series may be considered as data points of the same uni¬ 
variate variable X , or a multivariate record for time-specific variables 
X, ( t = 1900,1901,...,2010). Of course, the term data object can 
also refer to a multivariate data point that consists of values represent¬ 
ing conceptually-different variables, such as the area, population and 
GDP of a country. 

The generalization also encompasses datasets that are often re¬ 
garded as “unstructured”. For example, a piece of text may be treated 


as a multivariate record of M characters, each of which is a value of 
a variable Cj for encoding a letter, digit or punctuation mark at a spe¬ 
cific position j (j = 1,2,..., m) within the text. Hence, the multivari¬ 
ate record is a data object. Alternatively, we can consider a composite 
variable, T, which encodes all possible variations of texts with M or 
fewer characters. A specific text with 1 < k < M characters is thus a 
value of Y. This example also illustrates the equivalence between en¬ 
coding a data object as a multivariate data record or encoding it as an 
instance of single composite variable. 

In this generalized context, let Z be a variable, and Z = 
be the set of all its valid values. Z may be a univariate, 
multivariate, or composite variable. When Z is a multivariate variable, 
each of its valid value, Zi, is a valid combination of valid values of 
individual univariate variables. When Z is a composite variable, we 
can flatten its hierarchy by encoding the hierarchical relationships ex¬ 
plicitly using additional variables. The flattened representation thus 
represents a multivariate variable. Hereby Zi is a valid combination of 
valid values of individual variables including the additional ones. In 
information theory, such a set Z is referred to as an alphabet, and each 
of its member Zi as a letter. 

When the probability of every letter, p(z;), is known or can be es¬ 
timated, p is the probability mass function for the set Z. Shannon 
introduced the measure of entropy. 

M 

= -£p(z/)log 2 p(z ; ) 

I 

for describing the level of uncertainty of an alphabet. With the above 
logi-based formula, the unit of M’fZ) is bit. 

4.2 Transformation of Alphabets 

In many data-intensive environments, the alphabet of raw input data 
may contain numerous letters. For example, consider all valid time 
series of share prices within one hour period. Assuming that the share 
price is updated every 5 seconds, there are 720 data points per time se¬ 
ries. Assuming that we represent share price at USD $0.01 resolution 
using 32-bit unsigned integers, the minimum and maximum values are 
thus 0 and 2 32 — 1 cents respectively. (Note: Historically the high¬ 
est share price in the US is 347,600 cents, i.e., 2 18 < 347,600 < 2 39 ). 
If the probability of different time series were uniformly distributed, 
the entropy of this alphabet would be 23040 = 720 x log 2 (2 32 ) bits. 
This is the maximal entropy of this alphabet. In practice, as many high 
values in the range [0,2 3 “ — 1] are very unlikely, and sudden changes 
between a very low value and a very high value (or vice versa) during 
a short period are also rare, the actual entropy is lower than 23040 bits. 

On the other hand, if we need to consider r of such time series in 
order to make a decision, the size of the new alphabet will increase 
significantly. Although some combinations of r time series may be 
highly improbable, they may still be valid letters. Hence the maximal 
entropy of this new alphabet is 23040?' bits. Let us consider such r 
time series as the initial raw data for a data analysis and visualization 
process as illustrated in Fig. 2. 

One may find that the resolution of 1 data point per 5 seconds is 
not necessary, and choose to reduce it to 1 data point every minute by 
computing the average of 12 data points in each minute. The aver¬ 
age values may also be stored using 32-bit unsigned integers. This 
aggregation results in a new alphabet, whose maximal entropy of 
1920?' = ?- x 60 x log 2 (2 32 ) bits. When we use line plots to visual¬ 
ize these ?' time series, we may only be able to differentiate 128 data 
values per data point. In this case, the maximal entropy is reduced to 
r x 60 x log 2 (128) = 420?' bits. 

When one observes these r time series, one may identify some spe¬ 
cific features, such as [rise, fall, or flat], [slow, medium, or fast], [sta¬ 
ble, uneven, or volatile] and so on. These features become a new set 
of variables defined at the level of an hour-long time series. If we con¬ 
struct a new alphabet based on these feature variables, its entropy will 
be much less than 23040?' bits. For example, if there are 10 feature 
variables and each with 8 valid values, the maximal entropy of this 
“observational” alphabet is 30?' bits. 

When one analyzes the relations among these r time series, one 
may, for instance, compute the correlation indices between every pair 




Fig. 2. An example transformation of alphabets during a data analysis and visualization process. From left to right, the initial alphabet corresponds 
to r time series each capturing a share price at 5 second interval within an hour. For each time series, the 12 data points in every minute are then 
aggregated into a mean value. The r time series is then visualized as line plots. The analyst identifies various features during the visualization, 
such as different levels of rise or fall, different speed, etc. Meanwhile, the analyst computes the correlation indices between each pair of time series 
and visualize these using, for instance, a circular graph plot, where correlation indices are mapped to five different colors. The analyst finally makes 
a decision for each of the r shares as to buy, sell or hold. The maximal entropy Mmax shows a decreasing trend from left to right. 


of time series. This yields r(r— l)/2 numbers. Assuming that these 
are represented using 32-bit floating-point numbers, the maximal en¬ 
tropy of this “analytical” alphabet is around 15r(r — 1) bits as the 
single precision floating-point format supports some 2 30,7 values in 
[—1,1]. When we visualize these correlation indices by mapping them 
to, for instance, five colors representing [—1,—0.5,0,0.5,1], the en¬ 
tropy is reduced to log 2 (5)r(r— l)/2 ~ 1.16 r(r — 1) bits. 

One may wish to make a decision with three options, [buy, sell, or 
hold]. In this case, this “decisional” alphabet for each time series has 
only three letters. The maximal entropy of this alphabet is less than 2 
bits. If a decision has to be made for all r time series, we have less than 
2 r bits. Fig. 2 illustrates the abovementioned changes of alphabets 
with different maximal entropy values. The final alphabet ultimately 
defines the visualization task, while some intermediate alphabets may 
also capture subtasks in a data analysis and visualization process. 

4.3 Measuring Cost-Benefit Ratio 

From Fig. 2, one observation that we can make is that there is almost 
always a reduction of maximal entropy from the original data alphabet 
to the decisional alphabet. This relates to one of the basic objectives 
in statistical inference, i.e., to optimize the process between the initial 
alphabet and the final alphabet with minimal loss of information that 
is “important” to the decision based on the final alphabet. However, 
as visualization processes involve both machine-centric and human¬ 
centric mappings, it is necessary (i) to optimize both types of mapping 
in an integrated manner, (ii) to take into account “soft” information 
that can be introduced by human analysts during the process, (iii) to 
consider information loss as part of a cost-benefit analysis. 

Let us consider a sequential workflow with L processing steps. 
There are L + 1 alphabets along the workflow, Let Z s and Z s+ i be 
two consecutive alphabets such that: 


^j+l 


where F s is a mapping function, which can be an analytical algorithm 
that extracts features from data, a visual mapping that transforms data 
to a visual representation, or a human decision process that selects an 
outcome from a set of options. 

The cost of executing F s as part of a visualization process can be 
measured in many ways. Perhaps the most generic cost measure is 
energy since energy would be consumed by a computer to run an al¬ 
gorithm or to create a visualization, as well as by a human analyst 
to read data, view visualization, reason about a possible relationship, 


or make a decision. We denote this generic measurement as a func¬ 
tion ^o(F s ). While measuring energy usage by computers is becom¬ 
ing more practical [58], measuring that of human activities, especially 
cognitive activities may not be feasible in most situations. A more 
convenient measurement is time , %ime{F s ), which can be considered 
as an approximation of ^(Fs). Another is a monetary measurement 
of computational costs or employment costs, which represent a sub¬ 
jective approximation from a business perspective. Without loss of 
generality, we will use as our cost function in this section. 

Definition I (Alphabet Compression Ratio). As shown in Fig. 
2 , a mapping function (i.e., a machine or human processes) usually 
facilitates the reduction of data space at each stage of data processing 
though the reduction is not guaranteed. We can measure the level of 
reduction as the alphabet compression ratio (ACR) of a mapping F s \ 


* 1 'aCr(Fs) 


■*»&+!) 

Jif(Z s ) 


( 1 ) 


where Jif is the Shannon entropy measure. In a closed machine¬ 
centric processing system that meets the condition of a Markov chain, 
we have Jif(Z s ) > Ji°(Z s+ 1 ). This is the data processing inequality 
[18], In such a system, 'Hacr is a normalized and unitless entropy mea¬ 
sure in [0,1] as first proposed by Golan in [24] (see also [27]). How¬ 
ever, Chen and lanicke pointed out that the Markov chain condition 
is broken in most visualization processes [14], and further examples 
were given in [13], Hence, we do not assume that Jff(Z s ) > J4f(Z s+ 1 ) 
here since F s can be a human-centric transformation, unless one en¬ 
codes all possible variants of “soft” information and knowledge in the 
initial data alphabet. 

Meanwhile, given an output of an analytical process, F s , an analyst 
will gain an impression about the input. Considering the time series 
transformation in Fig. 2, for example, learning the mean price value 
for each minute, an analyst may have a conjecture about the 12 orig¬ 
inal data values. Viewing a visualization of each time series plot in a 
resolution of 128 possible values per data point, an analyst may infer, 
estimate or guess the time series in its original resolution of 2 32 pos¬ 
sible values per data point. Let us denote an impression about Z s as a 
variable Z', which is a result of a mapping G s such that: 


G s : Z s+l —> lJ s 

where Zl s is the alphabet of this impression with a probability mass 
function representing the inferred or guessed probability of each letter 
in Z Note that G s is a reconstruction function, similar to what was 

















































discussed in [36]. In most cases, G s is only a rough approximation of 
the true inverse function F~ 1 . The difference between such an impres¬ 
sion about Z' obtained from observing letters in Z s+ i and the actual Z s 
is defined by Kullback-Leibler divergence (or relative entropy) [18]: 

p(^ ) 

®kl(Z' s \\Z s ) = 2 > kl (G{Z s+1 )\\Z s ) = ^p(4-)log 2 —^ 

j ‘lwjV 


where z' s j £ Z', and z s j £ Z s ., and p and q are two probability mass 
functions associated with Z' and Z 5 respectively. SIjcl = 0 if and only 
if p = q, and %>kl > 0 otherwise. Note that S>kl is n °t a metric as it is 
not symmetric. The definition of S>ki. is accompanied by a precondi¬ 
tion that q = 0 implies p = 0. 

Definition 2 (Potential Distortion Ratio). With the log 2 formula, 
&kl is also measured in bits. The higher the number of bits is, the 
further is the deviation of the impression Z' from Z s . The potential 
distortion ratio (PDR) of a mapping F s is thus: 


*1 'pdr{F s ) 


Qkl&WZs) 

Jf{z s ) 


( 2 ) 


Both and 'YpDRiFs) are unitless. They can be used to 

moderate the cost of executing F s , i.e., ’ff(Fs). Since J4f(Z s+ i) indi¬ 
cates the intrinsic uncertainty of the output alphabet and S!jcl(Z' s \\Z s ) 
indicates the uncertainty caused by F s , the sum of 'Vacr{F s ) and 
1 ’pdr(Fs) indicates the level of combined uncertainty in relation to 
the original uncertainty associated with Z s . 

Definition 3 (Effectual Compression Ratio). The effectual com¬ 
pression ratio (ECR) of a mapping F s from Z s to Z s+ i is a measure of 
the ratio between the uncertainty before a transformation F s and that 
after: 


1 ¥ecr(F s ) 


Jtr{Z s+ i) + ®K L {Z' s \\Z s ) 
3P{Zs) 


for JT (Z s ) > 0 


(3) 


When M’iZ.f) = 0. it means that variable Z s has only one probable 
value, and it is absolute certain. Hence, the transformation of F s is un¬ 
necessary in the first place. The measure of ECR encapsulates the 
tradeoff between ACR and PDR, since deceasing ACR (i.e., more 
compressed) often leads to an increase of PDR (i.e., harder to infer 
Z s ), and vice versa. However, this tradeoff is rarely a linear (negative) 
correlation. Finding the most appropriate tradeoff is thus an optimiza¬ 
tion problem, which is to be further enriched when we incorporate 
below the cost “if (F?) as another balancing factor. 

DEFINITION 4 (Benefit). We can now define the benefit of a mapping 
F s from 7L S to Z J+ i as: 


SS(F s ) = 3V(Z S ) - Jif(Z s+ i) — ®kl{Z' s \\Zs) (4) 

The unit of this information-theoretic measure it bit. When fi&(F s ) = 0, 
the transformation does not create any change in the informational 
structure captured by the entropy. In otherwords, there is no informa¬ 
tional difference between observing variable Z s and observing Zs+l. 
When fiS{F s ) < 0, the transformation has introduced more uncertainty, 
which is undesirable. When fi§(F s ) > 0, the transformation has intro¬ 
duced positive benefit by reducing the uncertainty. This definition can 
be related to Shannon's grouping property [18]. 

Theorem (Generalized Grouping Property). Let A be a variable 
that is associated with an N-letter alphabet X and a normalized N- 
dimensional discrete distribution p(.v) ,x £ X. When we group letters in 
X in to M subsets, we derive a new variable Y with an M-letter alphabet 
Y and a normalized M-dimensional discrete distribution q(y), y £ Y. 

M 

Jf(X) = jr(Y)+'£q(y k )Jf k ( 5 ) 

A-=l 


where is the entropy of the local distribution of the original letters 
within the F h subset of X. Comparing Eq. (4) and Eq. (5), we can see 
that the last term on the right in Eq.,(5) is replaced with the Kullback- 
Leibler divergence term in Eq. (4). The equality in Eq. (5) is replaced 
with a measure of difference in Eq. (4). This is because of the nature 
of data analysis and visualization. After each transformation F s , the 
analyst is likely to infer, estimate or guess the local distribution within 
each subset, when necessary, from the observation of X in the context 
of Eq. (5) or Z s+ \ in the context of Eq. (4) in conjunction with some 
“soft” information and knowledge, as mentioned in Section 1. 

DEFINITION 5 (Incremental Cost-Benefit Ratio). The incremental 
cost-benefit ratio (Incremental CBR) of a mapping F s from Z s to Z J+ i 
is thus defined as the ratio between benefit fiS(F s ) and cost ^(F^). 


v, cl &&) ^(z s )-^(z s+l )-& KL (z’ s \\z s ) ^ 

r{Fs) Ws) (6) 

Note that we used cost as the denominator because (i) the benefit can 
be zero, while the cost of transformation cannot be zero as long as 
there is an action of transformation; (ii) it is better to associate a larger 
value to the meaning of more cost-beneficial. 

Given a set of cascading mapping functions, Fi,F 2 ,..-,F L , which 
transform alphabets front Zi to Z L+ i, we can simply add up their costs 
and benefits as: 

%otal = t V(F S ) 

5 = 1 

®to,al = t *<?’) = ^( Z l) - ^( Z L+l) - t @KL(Z'\\Z S ) 

5=1 5=1 

The overall cost-benefit ratio (Overall CBR) is thus &total!%otal- 
For workflows containing parallel mappings, the merge of CBR at 
a joint partly depends on the semantics of the cost and benefit mea¬ 
sures. If we are concerned about the energy, or monetary cost, the 
simple summation of cost measures arrived at a joint makes sense. If 
we are concerned about the time taken, we may compute the maxi¬ 
mum cost at a joint. If all parallel branches arriving at a joint contain 
only machine-centric processes, the benefit is capped by the entropy at 
the beginning of the branching-out. The combined benefit can be es¬ 
timated by taking into account the mutual information between the ar¬ 
riving alphabets. When these parallel branches involve human-centric 
processing, “soft” information will be added into the process. The 
combined benefit can be estimated in the range between the maximum 
and the summation of the arriving benefit measures. 

In this paper, we largely focus on the workflows for conducting data 
analysis and visualization. Our formulation of cost-benefit analysis 
can be extended to include the cost of development and maintenance. 
It is more appropriate to address such an extension in future work. 

5 Examples of Workflow Analysis 

In this section, we consider several successful visualization processes 
in the literature. We analyze their cost-benefit ratios in comparison 
with possible alternative processes. The comparison serves as initial 
validation of the information-theoretic measures proposed in the pre¬ 
vious section. Like most theoretic development, the validation of the 
proposed information-theoretic measures should be, and is expected to 
be, a long-term undertake, along with the advancement of techniques 
and the increasing effort for collecting performance data about various 
human-centric processes, e.g., through empirical studies. 

5.1 Interaction in Visualization 

In data analysis and visualization, human-computer interaction plays 
a significant role in breaking the condition of the data processing in¬ 
equality [14]. It enables human analysts to introduce “soft” informa¬ 
tion and knowledge into such a process. Here we consider that the 
initial alphabet at the beginning of the process represents “hard” data, 




Fig. 3. Interaction is one of the means for introducing “soft" information 
into a visualization process. This figure shows a sequence of interac¬ 
tions for overview first and detailed on demand. At some stage of a 
visualization process, the system receives a large detailed visual rep¬ 
resentation z e Z s . It creates an overview. A viewer selects a part of 
the overview and requests a detailed view, which is part of z. From this 
detailed view, the viewer explores a few nearby detailed views. At some 
stage, the viewer decides to finish the exploration and makes up his/her 
mind about something based on the overview and partial observation of 
the detailed visual representation. At each step, all possible valid inputs 
and outputs of a transformation are letters of an alphabet. 


e.g., 7L\ for representing variants of r time series. The “soft” informa¬ 
tion and knowledge is “external” to the process, which can no longer 
be a closed system. 

Interaction has been studied extensively in the context of visual¬ 
ization (e.g., [17,47,61,72,78]). One of the commonly-used forms 
of interaction is “overview first, zoom and detailed on demand” [54], 
It may feature several types of actions, including select, explore , ab¬ 
stract/elaborate and filter as defined by Yi et al. [78]. Fig. 3 illustrates 
such a process with the abstract notion presented in Section 4. Un¬ 
der an information-theoretic framework, the input and output of each 
transformation are considered in a holistic manner, i.e., as alphabets 
(e.g., all variants of images that may be displayed in a context) rather 
than individual letters (e.g., an image). 

One may imagine a very large image (or map) as an instance of 
alphabet Zj at the top of Fig. 3. An interactive system first presents 
viewers with an overview, which is an instance of alphabet Z J+ i. A 
viewer may select a part of the overview and apply a zoom-in oper¬ 
ation. The detailed view at that location is an instance of alphabet 
Z J+ 2 , which is a subset of Z s . From this detailed view, the viewer 
may choose to explore to a nearby location, and so on. At some stage, 


the viewer decides to finish the exploration and makes up his/her mind 
about something based on the overview and parts of the full image 
(or map) representation that has been explored so far. The examples 
of “soft” information and knowledge in this case may include: how 
important the individual subsets of 7L S are to the viewer, and which di¬ 
rection of exploration from one subset to the next is more promising. 

Let us consider the incremental CBR (cost-benefit ratio) of the 
overview transformation. Different techniques can be used to compute 
overview visualization, yielding different ACR (alphabet compression 
ratio) and PDR (potential distortion ratio). As this is a machine-centric 
process, the term 3>kl(Z' s \\Z s ) in PDR can be replaced with 


je(z s )-s(z s ,z a+1 ) 


where ^ is the mutual information between alphabets 2, s and Z J+ i 
with an assumption that a prefect inverse mapping F~ 1 from Z s+ [ to 
Zj can infer all mutual information, but no more than that. Hence, we 
can rewrite Eq. (6) as: 

y(Z s ,Z s+l )-JF(Z s+1 ) 

1 sj sf(F,) m) c 

Recall an example discussed in [14], where two different overview 
techniques for flow visualization were used to illustrate the optimiza¬ 
tion based on mutual information J^. This criterion is consistent with 
Eq. (6) when one assumes that that the two techniques maintain the 
same entropy for the output alphabet Z J+ i, while incurring the same 
cost c tf(F s ). Eq. (7) is thus an extension of what proposed in [14]. 

In Fig. 3, the transformations following Z J+ i are all human-centric 
processes. We can observe that these transformations will incur costs 
such as cognitive effort and time for interaction. One important con¬ 
sideration is the prior knowledge about Z s , i.e., how much information 
is already known to the viewers, and how much is uncertain. Consider¬ 
ing the same flow visualization example as in [14], one may estimate: 

• How likely do viewers know that 7L S are texture-based represen¬ 
tations of vector fields? 

• How confident are viewers about the correctness of the feature- 
extraction technique used to create an overview? 

• How long have the viewers been working on the simulation 
model that generates the vector fields being visualized? 

Such inference can be translate to an estimation about the term 
&kl{Z' s \\Z s ) in Eqs. 3 and 6. Some further optimization can be of¬ 
ten implemented on top of the basic form of oven’iew first and de¬ 
tails on demand. In many scenarios, we often observe that an expe¬ 
rienced viewer may find step-by-step zoom operations frustrating, as 
the viewer knows exactly where is the interesting part of a detailed 
representation. For example, in flow simulation, scientists often work 
on the same simulation problem for months, and have a good mental 
overview about Z s . In such a case, when the interactive visualization 
system has a fast track for reaching a specific detailed view (e.g., the 
last location visited), it reduces the cost of step-by-step zoom opera¬ 
tions. However, this approach may not be applicable to an online map 
system, where each search session is likely for a new search task. 

5.2 Disseminative Visualization 

The history of time series plot can be traced back more than a mil¬ 
lennium ago. If success is measured by usage, it is no doubt one of 
the most successful visual representations. However, its display space 
utilization is rather poor in comparison with a binary digits view [14]. 
Fig. 4 shows two such representations that are used as disseminative 
visualization for a scenario in Fig. 2. The dataset being displayed is a 
time series with 60 data points, i.e., an instance of Z 2 in Fig. 2. As¬ 
sume that the value of this particular share has been largely moving 
between 100 and 200 cents. Hence the entropy of Z flj i = Z^ ,1 is es¬ 
timated to be about 420 bits, significantly below the maximal entropy 
of the data representation. 

The binary digits view uses a 2x2 pixel-block per digit, and requires 
32x60 blocks (7,680 pixels) for the plotting canvas. Using the same 
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Fig. 4. Comparison between time series plot and binary digitals view for disseminative visualization. The same legend in Fig. 2 applies. The 
estimated benefit and cost values here are based on heuristic reasoning and for an illustrative purpose. For example, for we consider two 
feature variables [stable, uneven, volatile] and [rise, fall, flat]. Hence the maximal entropy of Z a , 3 is about 3.17 bits. As the 9kl term for SS a2 will 
indicate some uncertainty, the estimated benefit is 420- 3.17- gfcijZ^HZ^) « 415 bits. Meanwhile, 9kl for SS^ is much higher. 


number of pixels, 128 x 60, the time series plot is an instance of Z ■ 
During dissemination, the presenter (or analyst) points out “stable” 
and “rise” features to a viewer (or client), suggesting a decision “to 
hold”. The overall CBRs for the two pipelines in Fig. 4 are: 

_ 3 jr(Z aJ+l ) + ® KL (Z' aJ \\Z aJ ) 

p, °' - k 

^jr{Z bJ+l ) + 2>KL(Z' bJ \\Z bJ ) 

bi,iary-L ' 

To the presenter, the decision “to hold” has already been made, and 
the total CBR would be zero for either workflow. For a viewer un¬ 
familiar with binary representations, the binary digits view is almost 
undecipherable. Even for a pair of untrained eyes, recognizing features 
such as “stable” and “rise” would take a while. The inverse mapping 
from the features pointed out by the presenter is also rather uncertain, 
hence a high value for the 9kl term in 33b,2- The binary digits view 
thereby incurs a huge cost at the feature recognition step, while bring¬ 
ing lower benefit. This mathematically explains the merits of time 
series plot over a spatially-compact binary digits view. 

5.3 Observational Visualization 

The example in Section 5.2 can also be considered in the context of 
observational visualization, where an analyst creates visualization for 
him/herself. Similar abstract reasoning and step-by-step inference can 
be carried out, just as in the previous example, likely for a much larger 
input data alphabet (e.g., with r time series and t hours). 

Let us consider a different example of observational visualization. 
Legg et al. reported an application of visualization in sports [42], The 
Welsh Rugby Union required a visualization system for in-match and 
post-match analysis. One of the visualization tasks was to summa¬ 
rize events in a match, facilitating external memorization. The input 
datasets are typically in the form of videos including data streams dur¬ 
ing a match, and can be generalized to include direct viewing of a 
match in real-time. The alphabet is thus huge. The objective for sup¬ 
porting external memorization is to avoid watching the same videos 
repeatedly. Especially during a half-time interval, coaches and players 
cannot afford much time to watch videos. 

The workflow can be coarsely divided into three major transforma¬ 
tions, namely F a : transforming real-world visual data to events data, 
Fb'. transforming events data to visualization, and F c : transforming 
observations to judgments and decisions. Clearly, transformation F c 
should be performed by coaches and other experts. For transformation 
F a , two options were considered: F a \ for computers to detect events, 
and F a2 for humans to detect events. For transformation Fb, two op¬ 
tions were considered: F b i statistical graphics, and Fb ,2 glyph-based 
event visualization. For F a \ and F a p, the letters of the output alpha¬ 
bet are multivariate data objects describing what type of event, when 


and where it happens, and who are involved. This alphabet is much 
smaller than the input alphabet for real-world visual data. 

The team did not find any suitable computer vision techniques that 
could be used to detect events and generate the corresponding data ob¬ 
jects in this application. The accuracy of available techniques were 
too low, hence the 9kl term f° r F a 1 will yield a high-level of uncer¬ 
tainty. Using a video annotation system, an experienced sports analyst 
can generate more accurate event data during or after a match. For an 
80 minute Rugby match, the number of data objects generated is usu¬ 
ally in hundreds and sometimes in thousands. Flence statistics can be 
obtained, and then visualized using statistical graphics. However, it is 
difficult for coaches to make decisions based on statistical graphics, as 
it is difficult to connect statistics with episodic memory about events. 
Such a difficulty corresponds to a high-level of uncertainty resulting 
from the 9kl term f° r F b \. On the other hand, the direct depiction 
of events using glyphs can stimulate episodic memory much better, 
yielding a much lower-level uncertainty in the 9kI, term for F b 2 . The 
team implemented F a 2 and Fb ,2 transformations as reported in [42], 
while Fb 1 was also available for other tasks. 

5.4 Analytical Visualization 

Oelke et al. studied a text analysis problem using visual analytics [46], 
They considered a range of machine-centric and human-centric trans¬ 
formations in evaluating document readability. For example, the for¬ 
mer includes 141 text feature variables, and their combinations. The 
latter includes four representations at three different levels of details. 
Since different combinations of machine-centric and human-centric 
transformations correspond to different visual analytics pipelines, their 
work can be seen as an optimization effort. Through experimentation 
and analysis, they confirmed the need for enabling analysts to observe 
details at the sentence or block levels. Over-aggregation (e.g., assign¬ 
ing a readability score to each document) is not cost beneficial, as the 
tradeoff between the alphabet compression ratio (ACR) and the poten¬ 
tial distortion ratio (PDR) is in favor of PDR. 

5.5 Model-developmental Visualization 

In [57], Tam et al. compared a visualization technique and a machine 
learning technique in generating a decision tree as a model for expres¬ 
sion classification. The input to this model development exercise is a 
set of annotated videos, each of which records one of four expressions 
[anger, surprise, sadness, smile]. The output is a decision tree that is to 
be used to classify videos automatically with reasonable accuracy. It is 
thus a data analysis and visualization process for creating a data anal¬ 
ysis model. Although this sounds as a conundrum, it fits well within 
the scope of visualization. Tam et al. approached this problem through 
a series of transformations. The first transformation F a identifies 14 
different facial features in each video, and records it temporal changes 
using a geometric or texture measurement. This results in 14 different 
alphabets of time series. The second transformation F b characterizes 
each time series using 23 different parameters. This results in a total 






















of 322 = 14 x 23 variables. At the end of the second transformation, 
each video becomes a 322-variate data object. 

For the visualization-based pipeline, the third transformation F c \ 
generates a parallel coordinate plot with 322 axes. This is followed 
by the fourth transformation Fji, where two researchers laid the big 
plot on the floor and spent a few hours to select the appropriate vari¬ 
ables for constructing a decision tree. For the machine-learning based 
pipeline, the team used a public-domain tool, C4.5, as the third trans¬ 
formation F c 2 , which generates a decision tree from a multivariate 
dataset automatically. 

In terms of time cost, transformation F c 2 took much less time than 
transformations F c \ and Fg \ together. In terms of performance, the 
decision tree created by F c \ and Fj \ was found slightly more accu¬ 
rate than that resulting from F c 2 . From further analysis, they learned 
that (i) handling real values has been a challenge in automatic genera¬ 
tion of decision trees; (ii) the two researchers did not rely solely on the 
parallel coordinates plot to choose variables, their “soft” knowledge 
about the underlying techniques used in transformations F a and Fj, also 
contributed to the selection. Such “soft” knowledge reduces the uncer¬ 
tainty expressed by the 2>kl term in Eq. 4. This example demonstrates 
the important role of visualization in model development. 

6 Conclusions 

In this paper, we have proposed an information-theoretic measure for 
offering a mathematical explanation as to what may have been op¬ 
timized in successful visualization processes. We have used several 
examples in the literature to demonstrate its explanatory capability for 
both machine-centric and human-centric transformations in data anal¬ 
ysis and visualization. One question that naturally occurs is how one 
may use such a theoretical measure in a practical environment. We 
consider this question in three stages. 

(i) At present, it is important for us to recognize that the overall 
objective of data analysis and visualization corresponds to the reduc¬ 
tion of Shannon entropy from the original data alphabet to the deci¬ 
sional alphabet. There is a cost associated with this reduction process. 
It is also necessary to recognize that the benefit of such reduction at 
each incremental step is likely to be weakened by the uncertainty of 
an approximated inverse mapping, i.e., the &kl term in Eq. 4. This 
uncertainty can be caused by inaccuracy or aggressive aggregation of 
a machine-centric transformation, as well as by human factors such as 
visual uncertainty [19] and lack of understanding and experience. 

(ii) Next, we can learn from cost-benefit analysis in social sci¬ 
ences, where quantitative and qualitative methods are integrated to¬ 
gether to optimize various business and governmental processes in 
a systematized manner. Once a visualization process is defined as a 
transformation-based pipeline, we can estimate the cost for each trans¬ 
formation. We should start to define alphabets and estimate the uncer¬ 
tainty measures associated with them. 

(iii) Historically, theoretical advancements were often part of long¬ 
term co-evolution with techniques and processes for measurements. 
This suggests that in the future we will be able to optimize visualiza¬ 
tion processes in a more quantitative manner. It also suggests that in 
visualization, empirical studies are not only for evaluating hypotheses 
but also for collecting measurements that can potentially be used in 
process optimization. 
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